Stateful Replication
Stateful replication is the process of replicating components or services that manage internal state or data (like databases, caches, session-aware services) to provide fault tolerance, high availability, and consistency.
Unlike stateless systems, each replica in a stateful system must maintain or synchronize internal state, which introduces complexity in replication, especially regarding consistency, failover, and synchronization.
Key Goals of Stateful Replication
- Ensure no data loss even during node failure
- Provide read/write availability across replicas
- Maintain data consistency across replicated instances
- Enable automatic failover and recovery
Components That Require Stateful Replication
- Databases (PostgreSQL, MySQL, Cassandra)
- Message Queues (Kafka, RabbitMQ)
- Stateful Microservices (e.g., real-time game servers)
- File Storage Systems (like HDFS, Ceph)
Common Replication Models for Stateful Systems
| Model | Description |
|---|---|
| Primary-Replica | One leader handles writes; replicas sync and serve reads (may lag behind). |
| Multi-Leader | Multiple nodes handle writes and must sync with each other (conflict-prone). |
| Quorum-based | Consensus protocols (e.g., Paxos, Raft) determine consistency. |
Flow of Stateful Replication
- Client writes go to the primary node.
- Primary persists the state and sends updates to replicas.
- Replicas acknowledge the update.
- In case of primary failure, a replica is promoted as the new primary.
Trade-offs of Stateful Replication
| Pros | Cons |
|---|---|
| Data durability & fault tolerance | Complexity in state synchronization |
| Enables high availability and backups | Risk of replication lag or inconsistency (async mode) |
| Resilient to node failures | Harder to scale than stateless systems |
Technologies Supporting Stateful Replication
| System | Mechanism |
|---|---|
| PostgreSQL | Streaming replication, WAL logs |
| Kafka | Partition leader + ISR replicas |
| Cassandra | Peer-to-peer with hinted handoff |
| Redis | Primary-replica with AOF/RDB |
| Raft-based DBs | Leader election, log replication |
Diagram of Stateful Replication
+-------------------+
| Client App |
+-------------------+
|
v
+---------------------+
| Primary Node | <-- handles all writes
+---------------------+
||
Replication || (state sync)
vv
+----------------+ +----------------+
| Replica Node 1 | | Replica Node 2 |
+----------------+ +----------------+
(read-only or standby)
Example: PostgreSQL Streaming Replication
System Setup
- 1 Primary PostgreSQL node
- 2 Replica nodes
- Replicas use streaming replication to stay up to date
Scenario
- A financial application stores user transactions.
- All writes go to the primary database.
- The replica nodes copy write-ahead logs (WAL) and replay them to stay consistent.
- If the primary node crashes, a replica is promoted using failover tools (like Patroni).
Flow
- User makes a transaction → goes to primary node.
- Primary saves the transaction and logs it.
- WAL logs are streamed to replicas.
- Replicas apply the changes and update their state.
- Read-only queries go to replicas for load distribution.
Web Application Replication
Web applications typically store user session state (e.g., login info, cart contents, form data). If this session is stored locally in memory on one server, users need to be routed back to the same server for consistent experience.
This leads to stateful web app replication where the app servers maintain state and need careful routing.
Sticky Sessions
Sticky sessions (also called session affinity) ensure that a user's requests are always routed to the same server where their session state is stored.
Use Case
- Simple session management (no external session store).
- Useful for small-scale deployments.
Trade-offs
- Load imbalance (some servers may get overloaded).
- Fails if the server crashes (session lost unless session replication is used).
Flow
User A sends login request
→ Routed to App Server 1
→ App Server 1 stores session in memory
→ Sticky session ensures all future requests go to App Server 1
Tools
- NGINX, HAProxy (supports sticky sessions via cookies/IP hash)
- AWS ELB (Application Load Balancer supports sticky sessions)
Session Clustering
Session clustering replicates or shares session data across all app server instances. So, any server can handle any request, even in case of failure.
Benefits
- High availability
- No reliance on sticky sessions
- Easy to scale horizontally
How it's implemented
- In-memory data grids (e.g., Hazelcast, Apache Ignite)
- Distributed session stores (e.g., Redis, Memcached)
- Servlet container clustering (e.g., Tomcat session replication)
Flow
User A logs in on App Server 1
→ Session is saved in Redis
→ User's next request goes to App Server 2
→ App Server 2 retrieves session from Redis
→ Continues seamlessly
Database Replication
Databases also require stateful replication to maintain durability, consistency, and availability.
Common Replication Strategies
| Strategy | Description |
|---|---|
| Primary-Replica | One primary handles writes; replicas sync for reads (PostgreSQL, MySQL) |
| Multi-Master | Multiple nodes can write (e.g., Cassandra, CockroachDB) |
| Quorum-Based | Distributed consensus (e.g., Raft in etcd or Consul) |
Flow Example
- Primary DB handles transaction writes
- Data is streamed (e.g., via WAL logs) to one or more replica DBs
- Read replicas handle heavy read operations (e.g., for analytics, reports)
- On failure, a failover mechanism promotes a replica to become primary
Example Architecture
+-------------------------+
| Load Balancer |
+-----------+-------------+
|
Sticky Sessions OR Stateless (Session Store)
|
+----------------+ +----------------+ +----------------+
| App Server 1 | | App Server 2 | | App Server 3 |
| (Stores session| | (or shares via| | Redis/Memcache|
| or uses Redis)| | session cluster) | or Hazelcast |
+--------+-------+ +--------+-------+ +----------------+
\ | /
\ | /
\ +--------v---------+ /
\--------> Redis Cluster <--------
+--------+---------+
|
+--------v--------+
| Primary DB |
+--------+--------+
|
+---------v--------+
| Read Replica(s) |
+-------------------+